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In a wide range of business areas dealing with text data streams, including CRM, knowledge 
naanagement, and Web monitoring services, it is an important issue to discover topic trends 
and analyze their dynamics in real-time. Specifically we consider the following three tasks in 
topic trend analysis: l)Topic Structure Identification} identifying what kinds of main topics 
exist and how important they are, 2)Topic Emergence Detection; detecting the emergence 
of a new topic and recognizi ... 



Keywords: CRM, model selection, text mining, topic analysis 



2 Poster p a pers: A unifying framework for detecting outliers and change points from non- m 
stationary time series data 

Kenji Yamanishi, Jun-ichI Takeuchi 

July 2002 Proceedings of the eighth ACM SIGKDD international conference on 
Knowledge discovery and data mining 

Full text available* 1^ pdf(572 91 KB) Additional Information: full citation , abstract , references , citings , index 

' terms 

We are concerned with the issues of outlier detection and change point detection from a 
data stream. In the area of data mining, there have been increased interest in these issues 
since the former is related to fraud detection, rare event discovery, etc., while the latter is 
related to event/trend by change detection, activity monitoring, etc. Specifically, It is 
important to consider the situation where the data source is non-stationary, since the nature 
of data source may change over time in r ... 
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This paper is concerned with the problem of detecting outliers fronn unlabeled data. In prior 
work we have developed SmartSifter, which is an on-line outlier detection algorlthnn based 
on unsupervised learning from data. On the basis of SmartSifter this paper yields a new 
framework for outlier filtering using both supervised and unsupervised learning techniques 
iteratively in order to make the detection process more effective and more understandable. 
The outline of the framework is as follows: In ... 
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We are developing a technique to predict travel time of a vehicle for an objective road 
section, based on real time traffic data collected through a probe-car system. In the area of 
Intelligent Transport System (ITS), travel time prediction is an important subject. Probe-car 
system is an upcoming data collection method, in which a number of vehicles are used as 
moving sensors to detect actual traffic situation. It can collect data concerning much larger 
area, compared with traditional fixed dete ... 
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We report on an autonnated runtime anonnaly detection method at the application layer of 
multi-node computer systems. Although several network management systems are available 
in the market, none of them have sufficient capabilities to detect faults in multi-tier Web- 
based systems with redundancy. We model a Web-based system as a weighted graph, 
where each node represents a "service" and each edge represents a dependency between 
services. Since the edge weights vary greatly over time, the problem ... 
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With the continuous evolution of the types of attacks against computer networks, traditional 
intrusion detection systems, based on pattern matching and static signatures, are 
increasingly limited by their need of an up-to-date and comprehensive knowledge base. 
Data mining techniques have been successfully applied in host-based intrusion detection. 
Applying data mining techniques on raw network data, however, is made difficult by the 
sheer size of the input; this is usually avoided by discarding ... 
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We argue that there are many clustering algorithms, because the notion of "cluster" cannot 
be precisely defined. Clustering is in the eye of the beholder, and as such, researchers have 
proposed many induction principles and models whose corresponding optimization problem 
can only be approximately solved by an even larger number of algorithms. Therefore, 
comparing clustering algorithms, must take into account a careful understanding of the 
inductive principles involved. 
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Outlier detection has recently become an important problem in many industrial and financial 
applications. In this paper, a novel feature bagging approach for detecting outliers in very 
large, high dimensional and noisy databases is proposed. It combines results from multiple 
outlier detection algorithms that are applied using different set of features. Every outlier 
detection algorithm uses a small subset of features that are randomly selected from the 
original feature set. As a result, each out ... 
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Full text available: ^pdf(162.48 KB) Additional Information: full citation , abstract 

The outlier detection problem has important applications in the field of fraud detection, 
network robustness analysis, and Intrusion detection. Most such applications are most 
important for high-dimensional domains in which the data can contain hundreds of 
dimensions. Many recent algorithms have been proposed for outlier detection that use 
several concepts of proximity in order to find the outliers based on their relationship to the 
other points in the data. However, In high-dimensional space, t ... 
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November 2004 Proceedings of the thirteenth ACM conference on Information and 
knowledge management 

Full text available: ^ pdf(265.46 KB) Additional Information: full citation , abstract , references , index terms 

"One person's noise is another person's signal". Outlier detection is used to clean up 
datasets and also to discover useful anomalies, such as criminal activities in electronic 
commerce, computer intrusion attacks, terrorist threats, agricultural pest infestations, etc. 
Thus, outlier detection is critically important in the information-based society. This paper 
focuses on finding outliers in large datasets using distance-based methods. First, to speedup 
outlier detections, we revise Knorrand ... 
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The outlier detection problenn has important applications in the field of fraud detection, 
network robustness analysis, and intrusion detection. Most such applications are high 
dimensional domains in which the data can contain hundreds of dimensions. Many recent 
algorithms use concepts of proximity in order to find outliers based on their relationship to 
the rest of the data. However, in high dimensional space, the data is sparse and the notion 
of proximity fails to retain its meaningfulness. ... 
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Full text available: ffi pdf(769.56 KB) 
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We propose a new method of classifying documents into categories. We define for each 
category a finite mixture model based on soft clustering of words. We treat the problem of 
classifying documents as that of conducting statistical hypothesis testing over finite mixture 
models, and employ the EM algorithm to efficiently estimate parameters in a finite mixture 
model. Experimental results indicate that our method outperforms existing methods. 

Artificial intelligence #2: Network flow for outlier detection I I 

Ying Liu, Alan P. Sprague, Elliot Lefl<owit2 

April 2004 Proceedings of the 42nd annual Southeast regional conference 

Full text available: ^ pdf(255.64 KB) Additional Information: full citation , abstract , references , index terms 

Detecting outliers is an important topic in data mining. Sometimes the outliers are more 
interesting than the rest of the data. Outlier identification has lots of applications, such as 
intrusion detection, and unusual usage of credit cards or telecommunication services. In this 
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paper, we propose a novel method for outlier identification which Is based on network flow. 
We use the well known Maximunn Flow Mlnlmunn Cut theorem from graph theory to find the 
outliers and strong outlier groups. Especial ... 

Keywords: Maximum Flow Minimum Cut, data mining, graph theory, network flow, outlier 
detection 
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March 2004 Proceedings of the 2004 ACM symposium on Applied computing 

Full text available: ^ pdf(370.45 KB) Additional Information: full citation , abstract , references 

The behavior of spatial objects is under the influence of nearby spatial processes. Therefore 
in order to perform any type of spatial analysis we need to take into account not only the 
spatial relationships among objects but also the underlying spatial processes and other 
spatial features in the vicinity that influence the behavior of a given spatial object. In this 
paper, we address the outlier detection by refining the concept of a neighborhood of an 
object, which essentially characterizes sim ... 
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Full text available: ^ pdf(59Q.38 KB) ^ 
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Identification of outliers can lead to the discovery of unexpected, interesting, and useful 
knowledge. Existing methods are designed for detecting spatial outliers in multidimensional 
geometric data sets, where a distance metric is available. In this paper, we focus on 
detecting spatial outliers in graph structured data sets. We define statistical tests, analyze 
the statistical foundation underlying our approach, design several fast algorithms to detect 
spatial outliers, and provide a cost model ... 

Keywords: Outlier Detection, Spatial Data Mining, Spatial Graphs 



17 Research track posters: Locating secret messages in images 
Ian Davidson, Goutam Paul 

August 2004 Proceedings of the tenth ACM SIGKDD international conference on 
Knowledge discovery and data mining KDD '04 
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Steganography involves hiding messages in innocuous media such as images, while 
steganalysis is the field of detecting these secret messages. The ultimate goal of 
steganalysis is two-fold: making a binary classification of a file as stego-bearing or innocent, 
and secondly, locating the hidden message with an aim to extracting, sterilizing or 
manipulating it. Almost all steganalysis approaches (known as attacks) focus on the first of 
these two issues. In this paper, we explore the difficult relat ... 
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Full text available: ^ pdf(899.15 KB) Additional Information: full citation , abstract , references , index terms 

Covariance and correlation estimates have important applications in data mining. In the 
presence of outliers, classical estimates of covariance and correlation matrices are not 
reliable. A small fraction of outliers, in some cases even a single outlier, can distort the 
classical covariance and correlation estimates making them virtually useless. That is, 
correlations for the vast majority of the data can be very erroneously reported; principal 
components transformations can be misleading; and mu ... 
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Spatial outliers are the spatial objects with distinct features from their surrounding 
neighbors. Detection of spatial outliers helps reveal important and valuable information from 
large spatial data sets. In the field of meteorology, for example, spatial outliers can be 
associated with disastrous natural events such as tornadoes, hurricane, and forest fires. 
Previous study of spatial outlier mainly focuses on point data. However, in the 
meteorological data or other applications, spatial outlier ... 
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This paper discusses the problem of fitting mixture models to input data. When an input 
stream is an annalgam of data from different sources then such mixture models must be 
used if the true nature of the data is to be properly represented. A key problem is then to 
identify the different components of such a mixture, and in particular to determine how 
many components there are. This is known to be a non-regular/non-standard problem in the 
statistical sense and is technically notoriously diffic ... 
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Active Learning," 

77?^ Foundations of Real-World Intelligence, Oct. 2001 . 

13. K.Yamanishi: "Data and Text Mining," (in Japanese) 

in Iwanami: Statistical Science Frontier Series, Mar. 2003. 

14. S.Morinaga and K.Yamanishi: "Text Mining and Its Applications to Free Survey Data 
Analysis" (in Japanese) 

Journal of the Society of Instrument and Control Engineers. Vol.41, No.5, pp:354- 
357,2002. 
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15. K.Yamanishi: "New Trend of Data and Text Mining-Outlier Detection and Reputation 
IVIining' (in Japanese) 

Applied Mathematics, vol.12, No.4,p.7-22,2002. 

16. K.Yamanishi : "Extended Stochastic Complexity and Its Applications to Learning" 
to appear in Advances in Minimum Description Lengtti: Ttieory and Applications, The 
MIT Press 

17. K.Yamanishi, J.Takeuchi, Y.Matsunaga: "Security Mining" 

NEC TectinicalJoumal, Special Issue on Security, vol.56, No. 12, pp:41-45, NEC 
Corporation, 2003. 

18. T.Egawa, M.Kobayashi, K.Yamanishi, A.Arutaki, J.Namiki: "Dynamic Collaboration from 
Scientists' Eyes," 

Joumal of Advanced Technology, pp: 17-26, vol.1, No.1, 2004. 

19. .K.Yamanishi, J.Takeuchi, Y.Maruyama: "Three Types of Statistical Anomaly Detection, 

II 

Information Processing, vol.46, No.1, pp:34-40, 2005 

20. K.Yamanishi, J.Takeuchi, Y.Maruyama: "Data Mining for Security, " 
Journal of Advanced Technology, Vol.2, No.1, pp:63-69, 2005. 

21. K.Yamanishi and S.Morinaga: "Data Mining for Knowledge Organization," 
Journal of Advanced Technology, Vol.2, No. 2 , pp:129-136, 2005. 

Relereed Conference Papers 

1. K.Yamanishi: "On New Asymptotic Performance Evaluation of Binary Modular Codes," 
presented at 1988 IEEE International Symposium on Information Theory (I SIT88), Kobe 
Japan, June 1988. 

2. K.Yamanishl: "Inferring Optimal Decision Lists from Stochastic Data Using the Minimum 
Description Length Criterion," 

presented at 1990 IEEE International Symposium on Information Theory(ISIT90), San 
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Diego, CA, Jan. 1990. 

3. K.Yamanishi: "A Learning Criterion for Stochastic Rules," 

Proceedings of the Third Annual Worl<shop on Computational Learning Theory 
(COLT90), pp.67-81, IVIorgan Kaufmann, 1990. 

4. K.Yamanishi and A.Konagaya: "Learning Stochastic Motifs from Genetic Sequences," 
Proceedings of the Eighth International Workshop on Machine Learning(ML91), pp.467- 
471, IVIorgan Kaufmann, 1991. 

5. K.Yamanishi: "A Loss Bound IVIodel for On-Line Stochastic Prediction Strategies," 
Proceedings of the Fourth Annual Workshop on Computational Learning Theory 
(COLT91), pp.290-302, Morgan Kaufmann, 1991. 

6. A.Konagaya and K.Yamanishi: "Stochastic Decision Predicates: A New Scheme to 
Represent Motifs," 

presented at AAAI Workshop on Al and Molecular Biology, 1991 . 

7. K.Yamanishi: "Learning Non-parametric Densities by Finite Dimensional Parametric 
Hypotheses," 

Proceedings of the Second Annual Workshop on Algorithmic Learning Theory(AL T92), 
pp. 175-1 86, JSAI Press, 1992. 

8. K.Yamanishi: "Probably Almost Discriminative Learning," 

Proceedings of the Fifth Annual ACM Workshop on Computational Learning Theory 
(COLT92), pp.164-171, ACM Press, 1992. 

9. H.Mamitsuka and K.Yamanishi: "Protein Secondary Structure Prediction Based on 
Stochastic-Rule Learning," 

Proceedings of the Third Annual Workshop on Algorithmic Learning Theory(AL T92), 
pp.240-251. 1993. 

10. H.Mamitsuka and K.Yamanishi: "Protein $\alpha$-Helix Region Prediction Based on 
Stochastic-Rule Learning," 

Proceedings of the Twenty-Six Annual Hawaii International Conference on System 
Sciences(ICSS93). p.659-668, IEEE Computer Society Press, 1993. 
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1 1 . K.Yamanishi: "On Polynomial-time Probably Almost Discriminative Learnability," 
Proceedings of the Sixth Annual ACM Conference on Computational Learning Theory 
(COLT93), pp.94-100. ACM Press, 1993. 

12. K.Yamanishi: "Learning Non-parametric Smooth Rules by Stochastic Rules with Finite 
Partitioning," 

Computational Learning Theory: EuroCOLT'93, pp.21 7-228, Oxford, 1994. 

13. K.Yamanishi: "On-Line Prediction Based on the Extended Stochastic Complexity," 
presented at Workshop on Descriptional Complexity, organized by E.Pednault, 
Newbrunswick, NJ, 1994. 

14. K.Yamanishi: "The Minimum L-complexity Algorithm and Its Applications to Learning 
Non-parametric Rules," 

Proceedings of the Seventh Annual ACM Workshop on Computational Learning Theory 
(COLT94). p. 173-1 82, ACM Press, 1994. 

15. K.Yamanishi: "On-Line Maximum Likelihood Prediction with respect to General Loss 
Functions," 

Lecture Notes in Artificial Intelligence 904, Computational Learning Theory: Second 
European Conference, EuroCOLT'95, pp.84-98. Springer, 1995. 

16. K.Yamanishi: "Randomized Approximate Aggregating Strategies and Their Applications 
to Prediction and Discrimination," 

Proceedings of the Eigth Annual Conference on Computational Learning Theory 
(COLT95), pp.83-90, 1995. 

17. K.Yamanishi: "A Randomized Approximation of the MDL for Stochastic Models with 
Hidden Variables," 

Proceedings of the Eigth Annual Conference on Computational Learning Theory 
(COLT96), pp.99-109, ACM Press, 1996. 

18. K.Yamanishi: "Distributed Cooperative Bayesian Learning Strategies," 
Proceedings of the Tenth Annual Conference on Computational Learning Theory 
(COLT97), pp.250-262, ACM Press, 1997. 

19. H.Li and K.Yamanishi: "Document Classification Using A Finite Mixture Model," 
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Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics 
(ACL97), p.39-47, Morgan Kaufmann, 1997. 

20. K.Yamanishi: "Minimax Relative Sequence Analysis for Sequential Prediction Algorithms 
Using Parametric Hypotheses," 

Proceedings of the 11th Annual Conference on Computational Learning Theory 
(COLT98). pp.32-43, 1998. 

21. H.Li and K.Yamanishi: " Text Classification Using ESC-Based Decision Lists ," 
Proceedings of International Conference on Information & Knowledge Management 
(CIKM99), pp. 122-1 30, 1999. 

22. K.Yamanishi: "Extended Stochastic Complexity in Individual Sequence Analysis," 
Proceedings of the 1999 Workshop on Information-Based Induction Sciences(IBIS99), 
pp. 163-1 68, 1999. 

23. H.Li and K.Yamanishi: "Text Classification Using ESC-Based Decision Llsts,"(in 
Japanese) 

Proceedings of the 1999 Workshop on Information-Based Induction Sciences(IBIS99), 
pp.239-244, 1999. 

24. K.Yamanishi, J.Takeuchi, G.Williams, and P.Milne: " On-line Unsupervised Oultlier 
Detection Using Finite Mixtures with Discounting Learning Algorithms, " 

in Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge 
Discovery and Data Mining(KDD2000), ACM Press, pp:320~324 2000. 

25. K.Yamanishi and J.Tal^euchi: " Statistical Outlier Detection Using On-line Discounting 
Learning Algorithms ,"(in Japanese) 

Proceedings of the 2000 Workshop on Information-Based Induction Sciences(IBIS2000), 
2000. 

26. H.Li and K.Yamanishi: "Statistical and Lexical Topic Analysis Using a Finite Mixture 
Model,"(in Japanese) 

Proceedings of the 2000 Workshop on Information-Based Induction Sciences(IBIS2000), 
2000. 

27. H.Li and K.Yamanishi: " Statistical and Lexical Topic Analysis Using a Finite Mixture 
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Model, " 

Proceedings of ACL Workshop on Very Large Corpora, 2000. 

28. K.Yamanishi and J.Takeuchi: " Discovering Outlier Filetering Rules From Unlabeled Data- 
"Combininig Supervised Learners with Unsupervised Learneres-," 

Proceedings oftfie Seventh ACM SIGKDD International Conference on Knowledge 
Discovery and Data Mining(KDD2001 I ACM Press, pp:389-394, 2001. 

29. H.Li and K.Yamanishi: " Mining from Open Answers in Quessionare Data . " 
Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge 
Discovery and Data Mining(KDD2001), ACM Press, pp:443-449, 2001. 

30. K.Yamanishi and J.Takeuchi: " Discovering Outlier Filetering Rules From Unlabeled 
Data, " (in Japanese) 

Proceedings of the 2001 Workshop on Information-Based Induction Sciences 
(IBIS2001). pp: 1 11 -1 1 6, 2001 . 

31. H.Li and K.Yamanishi: " A Statistical Approach to Analgzing Open Answers in 
Quessionare Data ," (in Japanese) 

Proceedings of the 2001 Workshop on Information-Based Induction Sciences 
(IBIS2001), pp:1 29-1 34, 2001. 

32. K.Yamanishi and J.Takeuchi: " A Unifying Approach to Detecting Outliers and Change- 
Points from Nonstationary Dat a," 

Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge 
Discovery and Data Mining(KDD2002), ACM Press, 2002. 

33. S.Morinaga, K.Yamanishi, K.Tateishi, and T.Fukushima: " Mining Product Reputations on 
the Web ," 

Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge 
Discovery and Data Mining(KDD2002), ACM Press, 2002. 

34. J.Takeuchi and K.Yamanishi: " A Unifying Approach to Detecting Outliers and Change 
Fonts Using Discounting Learning Algorithms ," (in Japanese) 

Proceedings of the 2002 Workshop on Information-Based Induction Sciences 
(IBIS2002). 2002. 
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35. S.Morinaga, K.Yamanishi, JTakeuchi: " Distributed Cooperative Mining from Different 
Information Sources ,"(in Japanese) 

Proceedings of the 2002 Workshop on Information-Based Induction Sciences 
(IBIS2002X 2002. 

36. S.Morinaga, K.Yamanishi, J.Takeuchi: " Distributed Cooperative Mining for Information 
Consortia, " 

Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge 
Discovery and Data Mining(KDD2003), ACM Press, 2003. 

37. Y.Matsunaga and K.Yamanishi: " An Information-theoretic Approach to Detecting 
Anomalous Behaviors ," (in Japanese) 

Proceedings of the Second Forum on Information Technologies (FIT2003) , 2003. 

38. Y.Matsunaga and K.Yamanishi: " Dynamic Model Selection and Its Applications to 
Anomalous Behavior Detection , "(in Japanese) 

Proceedings of the 2003 Workshop on Information-Based Induction Sciences 
(IBIS2003), 2003. 

39. S.Morinaga and K.Yamanishi: " Tracking Dynamics of Topic Trends Using a Finite 
Mixture Mode l, "(in Japanese) 

Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge 
Discovery and Data Mining (KDD2004), ACM Press, 2004. 

40. Y.Maruyama and K.Yamanishi: " Dynamic Model Selection with Its Applications to 
Computer Security ," 

Proceedings of 2004 IEEE International Workshop on Information Theory, 2004. 

41 . Y.Maruyama and K.Yamanishi: " Dynamic Model Selection with Its Applications to 
Computer Security ," ( in Japanese ) 

Proceedings of the 2004 Workshop on Information-Based Induction Sciences 
(IBIS2004), pp: 15-22, 2003. 

42. S.Morinaga and K.Yamanishi: " Mining Dynamics of Topic Trends Using a Finite Mixture 
Model ,"(in Japanese) 

Proceedings of the 2004 Workshop on Information-Based Induction Sciences 
(IBIS20041 pp:78-85, 2004. 
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43. K.Yamanishi and Y.Maruyama: "Dynamic Model Selection for Network Failure 
Monitoring," to appear in 

Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge 
Discovery and Data Mining 
(KDD2005), ACM Press, 2005. 



Invited Confarence Papers 



1. K.Yamanishi: ''On New Asymptotic Performance Evaluation of Binary Modular Codes," 
Proceedings of Workshop on Coding Theory, Osaka Japan, June 1988. 

2. K.Yamanishi: "Computational Learning Theory and the MDL Principle," (in Japanese) 
Proceedings of Information Theory and Its Applications Workshop, p.55-58, 1991. 

3. K.Yamanishi: ''Why does the MDL give an effective learning strategy?" (in Japanese) 
Proceedings of the 5th Annual Conference forJSAI, p.77-80, June 1991. 

4. K.Yamanishi: "A Statistical Approach to Computational Learning Theory," 
Proceedings of the Third NEC Research Symposium, pp.238-276, SI AM, 1992. 

5. K.Yamanishi: "On Complexity of MDL Learning and Discrimination," 
Proceedings of 1993 IEEE Information Theory Workshop, p.30-31 , 1 993. 

6. K.Yamanishi: "Generalized Stochastic Complexity and Its Applications to Learning," 
Proceedings of the 1994 Conference on Information Science and Systems, vol.2, pp.763- 
768, 1994. 

7. K.Yamanishi: "A Decision-theoretic Extension of Stochastic Complexity and Its 
Applications to Learning, "(in Japanese) 

Proceedings of the 1998 Workshop on Information-Based Induction Sciences, pp.33-41 , 
1998. 

8. K.Yamanishi: "From MDL criterion to Extended Stochastic Complexity," (in Japanese) 
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Proceedings oflEICE (Institute of Electronics, Information, Communication, and 
Engineers), 1998. 

9. K.Yamanishi: '^ Extended Stochastic Complexity and Minimax Relative Loss Analysis," 
Algorithmic Learning Theory: The Tenth International Conference, AL T99, 
Proceedings, pp.26-38, 1999. 

10. K.Yamanishi: ^ Mnformation-Based Induction Sciences-Trends and Related Topics ," (in 
Japanese) 

Proceedings of Symposium of the Institute on System, Control, and Information 
Engineers, pp:17-24. May 2000. 

11. K.Yamanishi and J.Takeuchi: ''Data Mining and Business HPC," (in Japanese) 
Proceedings ofNEC/HPC Workshop, Tokyo Japan, Dec. 2000. 

12. K.Yamanishi: "Data and Text Mining Based on Information-Based Induction 
Sciences," (in Japanese) 

Proceedings of Workshop on Al Fundamentals, Hokuriku, Japan, Mar. 2001 . 

13. K.Yamanishi: "Text Mining Using Stochastic Modeling of Text Data," (in Japanese) 
Proceedings of Workshop on Al Symposium, Tokyo, Japan, July 2001 . 

14. K.Yamanishi: " Web Mining and Information-Based Induction Sciences, " (in Japanese) 
Proceedings of Information Science Symposium, Japan, January 2002. 

15. K.Yamanishi and J.Takeuchi: "Anomaly Detection by Data Mining and Its Applications 
to Network Intrusion Detection," (in Japanese) 

Proceedings of lEICE Information Network Research Group, Japan, June, 2002. 

16 .K.Yamanishi: " Web Mining and Information-Based Induction Sciences-Reputation 
Mining and Log Mining , "(in Japanese) 

Proceedings of Information-Based Induction Sciences(IBIS2002), Japan, September, 
2002. 

17 .K.Yamanishi: "Detecting Anomalies and Change-points for Cyber Threat Analysis, " 
Proceedings of IEEE Workshop on Data Mining for Cyber Threat Analysis, Japan, 
December, 2002. 
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18 .K.Yamanishi: "Data Mining Realizing Security/Web Intelligence, " (in Japanese) 
Proceedings of Al Symposium, JSAI, pp:99-1 04, Japan, April, 2002. 

19. K.Yamanishi: " Data Mining and Security , "(in Japanese) 
Proceedings of tiie 17tfi Annual Conference on JSAI, Japan, June, 2002. 

20. K.Yamanishi: "Text Mining, " (in Japanese) 

Proceedings of the Second Forum on Information Tecfinologies (FIT2003) , 2003. 

21. K.Yamanishi: "Text Mining and NLP Business, "(in Japanese) 
Proceedings of 2003 JEITA Symposium on Natural Language Processing-NLP 
Business, 2003. 

22. K.Yamanishi: "Data Mining based Security Technologies," (in Japanese) 
Proceedings of Artificial Intelligence Seminar-Computer Security andAI-, 2005. 

Other Invited Talks 

1. 

''Algebraic-Geometric Codes," 

presented for Seminar at Yokohama National University (hosted by Prof.H.lmai), 
Kanagawa Japan, Feb. 1987. 

2. ''Algebraic-Geometric Codes," 

presented at Worksfiop on Combinatorial Theory and Its Applications, Tsukuba Japan, 
July 1987. 

3. "Algebraic-Geometric Codes," 

presented for Seminar at Electro-Communication University (hosted by 
Prof.H.Mizuno), Tokyo Japan, Nov. 1987. 

4. "A Theory of Learning Stochastic Rules," 

presented for Seminar at University of Tokyo (hosted by Prof.S.Amari), Tokyo Japan, 
July 1991. 

5. "Learning Based on the MDL Principle," 

presented for Seminar at IBM Almaden Research Center (hosted by J.Rissanen), CA, 
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6. "Learning Theory and the MDL Principle," 

presented at Workshop on Pattern Recognition, University of Tokyo, Tokyo Japan, 
Feb. 1992. 

7. "Universal Discrimination Using the MDL Principle," 

presented for Seminar at Electro-Communication University (hosted by Prof.H.Morita), 
Tokyo Japan, July 1992. 

8. "The Minimum L-Complexity Algorithm and Its Applications to Learning," 
presented for Seminar at AT&T Bell Laboratories, Murray Hills (hosted by Y.Freund), 
NJ U.S.A., Feb. 1994. 

9. "A Decision-theoretic Extension of Stochastic Complexity and Its Applications to 
Learning," 

presented for Seminar at University of Tokyo (hosted by Prof.K.Hayami), Tokyo 
Japan, Feb. 1996. 

10. "A Decision-theoretic Extension of Stochastic Complexity and Its Applications to 
Learning," 

presented for Seminar at University of Tokyo (hosted by Prof.Tsujii), Tokyo Japan, 
October 1997. 

11. "Informatin-Based Induction Sciences," 

presented at Worl<sfiop on l\Aathematical Engineering Methods for Statisticai 
Information Processing, The Institute of Statist. Math., January 1998. 

12. "Extended Stochastic Complexity and Learning Theory," 

presented for Seminar at Waseda University (hosted by Prof.Matsushima), Tokyo 
Japan, May 1998. 

13. "Extended Stochastic Complexity and Learning Theory," 

presented for Seminar at Electro-Communication University (hosted by Prof. Te-sun 
Han), Tokyo Japan, December 1998. 

14. "Extended Stochastic Complexity and Machine Learning," 
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presented for Seminar at University of Tokyo (hosted by Prof.Yamamoto), Tokyo Japan, 
May 1999. 

15. "Information Mining-Fraud Detection and Text Mining," 

presented at Statistical Sciences and Data /[/lining, The institute of Statist. Math., 
Tokyo Japan, October 1999. 

16. "On-line Unsupervised Outlier Detection Using Finite Mixture Models," 

presented at Toward a New Unification of Statistical Sciences, Neural Networks, and 
Data Mining, The Institute of Statist. Math., Tokyo Japan, Nov. 2000. 

17. "Latest Data Mining Technologies with Their Applications to CRM," 

presented at Datawarehouse and CRM Expo. Tutorial Seminar, Tokyo Japan, June 
2001. 

18. "Web Mining," 

presented at JEITA Research Seminar, June 2003. 

19. "Data Mining-Toward Security Intelligence and Knowledge Organization," 
presented at Tsukuba University, July 2004. 



Note The list here does not include any papers published without being reviewed, except 
invited papers. Please contact me directly if you wish to look at them. 

Professional Activities 

Lecturer 

1 . A Special Lecture at Graduate School at Universtiy of Tokyo from Nov.2000 to Feb.2001 . 
Committees 

1. Member of COLT (Computational Learning Theory) Working Group since 1994. 
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2. Program committee member on COLT'93 (ACM Conference on Computational Learning 
Theory), 1993. 

3. Program committee member on ML'94 (International Conference on Machine Learning), 
1994. 

4. Program committee member on EuroCOLT'95 (European Conference on Computational 
Learning Theory), 1995. 

5. Program committee member on ML'95 (International Conference on Machine Learning), 
1995. 

6. Program committee member on WCNN'95 (World Conference on Neural Networks), 
1995. 

7. Program committee member on ALT'96 (Workshop on Algorithmic Learning Theory), 
1996. 

8. Advisory committee member on COLT'97 (International Conderence on Computational 
Learning Theory), 1997. 

9. Chair of 1998 Workshop on IBIS'98 (Information-Based Induction Sciences), 1998. 

10. Committee member on Information Theory Society of lEICE (Institute of Electronics, 
Information, Communication, and Engineers). 

1 1 . Program committee member on COLT'99 (ACM Conference on Computational Learning 
Theory), 1999. 

12. Editor of Special Issue of Information Theory, Statistical Methods, and Machine Learning 
in SICE (Society of Instrument and Control Engineers), 1999. 

13. Program chair on IBIS'99 (Information-Based Induction Sciences), 1999. 

14. Editor of Special Issue of Information-Based Induction Sciences in lEICE (Institute of 
Electronics, Information, Communication, and Engineers), 1999. 
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15. Program committee member on IBIS 2000 (Information-Based Induction Sciences), 
2000. 

16. Program committee member on Special Issue of Information-Based Induction Sciences 
in Journal of Japanese Society of Artificial Intelligence, 2000 

17. Chair of Time-Limited Research Committee on Information-Based Induction Sciences, 
lEICE (Institute of Electronics, Information, Communication, and Engineers), Information 
Systems Society, 2001-2003. 

18. Program committee member on IBIS 2001 (Information-Based Induction Sciences), 
2001. 

19. Member on Editorial Boad on Special Issue of Information-Based Induction Sciences in 
lEICE (Institute of Electronics, Information, Communication, and Engineers), 2001. 

20. Member on Editorial Boad, Fundamentals in lEICE (Institute of Electronics, Information, 
Communication, and Engineers), 2001- 

21. Member on Society Editorial Boad, Information Systems in lEICE (Institute of Electronics, 
Information, Communication, and Engineers), 2001 -. 

22. Member on Editorial Boad on Program on Special Issue of Information-Based Induction 
Sciences in lEICE (Institute of Electronics, Information, Communication, and Engineers), 
2002. 

23. Program committee member on IBIS 2002 (Information-Based Induction Sciences), 
2002. 

24. Program committee member on DS' 02 (Conference on Discovery Science), 2002. 

25. FIT(Forum on Information Technologies) Program committee member on FIT 2002. 

26. Program committee member on OTC-03 (3rd Workshop on Operational Text 
Classification), 2003 
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27. Member on Editorial Boad on Program on Special Issue of Information-Based Induction 
Sciences in lEICE (Institute of Electronics, Information, Communication, and Engineers), 
2004. 

28. Co-Chair on IJCNLP2004 (First International Conference on Natural Language 
Processing), 2004. 

29. Program committee member on KDD2004 (ACM Conference on Knowledge Discovery 
and Data Mining). 

30. Chair on 2004 IBIS*04 (Information-Based Induction Sciences) , 2004. 

31. Program committee member on IJCAI2005. 

32. Program committee member on ALT2005 

33. Program committee member on KDD2005 

34. Steering Committee member on Society on Information Theory and Its Applications, 
JAPAN. 

Referee for Journal Submission 

• IEEE Transactions on Information Theory. 

• IEEE Transactions on Neural Networks. 

• Journal of Computer and System Sciences. 

• Information and Computation. 

• SIAM Journal on Computing. 

• Machine Learning. 
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• Theoretical Computer Science. 

• Information Processing Letters. 

• lEICE (The Institute of Electronics, Information and Communication Engineers) 
Transactions. 

• Journal of Japan Society for Fuzzy Theory and Systems 

• Journal of Japan Society for Artificial Intelligence 

Tenum/Dissertation Committees 

• Vijay Ragavan-Promoted to Associate Professor with tenure in Vanderbilt University, 
1995. 

• Peter Grunwald-Received Ph.D with the paper "The Minimum Description Length 
Principle and Reasoning Under Uncertainty" 

from Universiteit van Amsterdam, 1999. 
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